Executed: Tue Oct 11 12:00:15 2016
Duration: 488 seconds.
In [1]:
sigma = 0.016
time_window = 30
time_step = 5
time_start = -900
time_stop = 900
decimation = 20
t0_vary = True
true_params = dict(
tau = 60, # time constant
init_value = 0.3, # initial value (for t < t0)
final_value = 0.8, # final value (for t -> +inf)
t0 = 0) # time origin
num_sim_cycles = 1000
taus = (30, 60)
In [2]:
# Cell inserted during automated execution.
time_start = -900
num_sim_cycles = 1000
t0_vary = False
time_window = 30
taus = (5, 10, 30, 60)
decimation = 20
time_stop = 900
time_step = 5
true_params = {'init_value': 0.3, 't0': 0, 'tau': 60, 'final_value': 0.8}
sigma = 0.016
t = t0
).t = t0
).t0
) during the fittau
simulated during repeated fits (Monte-Carlo).This notebook fits simulated exponential transients with additive Gaissian noise in order to study time-constant fitting accuracy. In particular we compare a simple exponential model with a more realistic model with integration window, checking the effect on the fit results.
You can either run this notebook directly, or run it through the master notebook for batch processing.
In [3]:
%matplotlib inline
import numpy as np
import lmfit
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
In [4]:
import models # custom module
Models used to fit the data.
In this model, we define the model function as an exponential transient:
$$ y = f(t) = A \cdot e^{-t/\tau} + K$$The python function implementing it is:
models.exp_func()
.Next cell defines and initializes the fitting model (lmfit.model.Model
) including parameters' constrains:
In [5]:
labels = ('tau', 'init_value', 'final_value')
model = models.factory_model_exp(t0_vary=True)
A more realistic model needs to take into account that each data point is the result of an integration over a time window $w$:
$$f(t) = A \cdot e^{-t/\tau} + K$$$$y(t) = \int_{t}^{t+w} f(t')\;dt'$$In other words, when we process a measurement in time chunks, we are integrating a non-stationary signal $f(t)$ over a time window $w$. This integration causes a smoothing of $f(t)$, regardless of the fact that time is binned or is swiped-through with a moving windows (overlapping chunks).
Numerically, $t$ is discretized with step equal to (time_step / decimation)
.
The python function implementing this model function is:
models.expwindec_func()
.And, finally, we define and initialize the fitting model parameters' constrains:
In [6]:
modelw = models.factory_model_expwin(t_window=time_window, decimation=decimation, t0_vary=t0_vary)
These are the models used to generate the simulates (noisy) data.
In this simple model, we simulate random data $Y$ as an exponential decay plus additive Gaussian noise:
$$ Y(t_k) = f(t_k) + N_k $$$$ \{N_k\} \sim {\rm Normal}\{\mu=0; \sigma\}$$$$ \Delta t = t_k - t_{k-1} = \texttt{time_step}$$For the "integrating window" model, we first define a finer time axis $\theta_i$ which oversamples $t_k$ by a factor $n$. Then we define the function $Y_f$ adding Gaussian noise $\sqrt{n}\,N_i$, with $n$ times larger variance:
$$ Y_f(\theta_i) = f(\theta_i) + \sqrt{n}\,N_i $$$$ \Delta \theta = \theta_i - \theta_{i-1} = \texttt{time_step} \;/\; n$$Finally, by averaging each time window, we compute the data on the coarse time axis $t_k$:
$$ Y_w(t_k) = \frac{1}{m}\sum_{i} Y_f(\theta_i)$$Here, for each $t_k$, we compute the mean of $m$ consecutive $Y_f$ values. The number $m$ is chosen so that $m\, \Delta \theta$ is equal to the time window.
The amplitude of the additive noise ($\sigma$) is estimated from the experimental kinetic curves. In particular we take the variance from the POST period (i.e. the steady state period after the transient). The POST period has been chosen because it exhibits higher variance than the PRE period (i.e. the steady state period before the transient). These values have been calculated in 8-spot bubble-bubble kinetics - Summary.
In both models we define the noise amplitude as sigma
(see first cell):
sigma = 0.016
We also define the parameters for the time axis $t$:
time_start = -900 # seconds
time_stop = 900 # seconds
time_step = 5 # seconds
The simulated kinetic curve has the following parameters:
true_params = dict(
tau = 60, # time constant
init_value = 0.3, # initial value (for t < t0)
final_value = 0.8, # final value (for t -> +inf)
t0 = 0) # time origin
Here we simulate one kinetic curve and fit it with the two models (simple exponential and integrated exponential).
Time axis for simulated data:
In [7]:
t = np.arange(time_start, time_stop-time_window, time_step).astype(float)
t.size
Out[7]:
An ideal transient (no noise, no integration):
In [8]:
y = models.expwindec_func(t, t_window=time_window, **true_params)
y.shape
Out[8]:
A simulated transient (including noise + integration):
In [9]:
time_window, time_step
Out[9]:
In [10]:
yr = models.expwindec_func(t, t_window=time_window, sigma=sigma, **true_params)
yr.shape
Out[10]:
Plot the computed curves:
In [11]:
plt.plot(t, y, '-', label='model')
plt.plot(t, yr, 'o', label='model + noise')
Out[11]:
Fit the "Integrated Exponential" model:
In [12]:
#%%timeit
resw = modelw.fit(yr, t=t, tau=10, init_value=0.1, final_value=0.9, verbose=False)
Fit the "Simple Exponential" model:
In [13]:
#%%timeit
res = model.fit(yr, t=t + 0.5*time_window, tau=10, init_value=0.1, final_value=0.9, verbose=False)
Print and plot fit results:
In [14]:
fig = plt.figure(figsize=(14, 8))
res.plot(fig=fig)
ci = lmfit.conf_interval(res, res)
lmfit.report_fit(res)
print(lmfit.ci_report(ci, with_offset=False))
#plt.xlim(-300, 300)
In [15]:
fig = plt.figure(figsize=(14, 8))
resw.plot(fig=fig)
ci = lmfit.conf_interval(resw, resw)
lmfit.report_fit(resw)
print(lmfit.ci_report(ci, with_offset=False))
#plt.xlim(-300, 300)
The number simulation cycles is defined by num_sim_cycles
. Current value is:
In [16]:
num_sim_cycles
Out[16]:
The fixed kinetic curve parameters are:
In [17]:
{k: v for k, v in true_params.items() if k is not "tau"}
Out[17]:
While tau
is varied, taking the following values:
In [18]:
taus
Out[18]:
In [19]:
t0_vary
Out[19]:
In [20]:
def draw_samples_and_fit(true_params):
# Create the data
t = np.arange(time_start, time_stop-time_window, time_step).astype(float)
yr = models.expwindec_func(t, t_window=time_window, sigma=sigma, decimation=100, **true_params)
# Fit the model
tc = t + 0.5*time_window
kws = dict(fit_kws=dict(nan_policy='omit'), verbose=False)
res = model.fit(yr, t=tc, tau=90, method='nelder', **kws)
res = model.fit(yr, t=tc, **kws)
resw = modelw.fit(yr, t=t, tau=400, decimation=decimation, method='nelder', **kws)
resw = modelw.fit(yr, t=t, decimation=decimation, **kws)
return res, resw
In [21]:
def monte_carlo_sim(true_params, N):
df1 = pd.DataFrame(index=range(N), columns=labels)
df2 = df1.copy()
for i in range(N):
res1, res2 = draw_samples_and_fit(true_params)
for var in labels:
df1.loc[i, var] = res1.values[var]
df2.loc[i, var] = res2.values[var]
return df1, df2
In [22]:
mc_results1, mc_results2 = [], []
In [23]:
%%timeit -n1 -r1 # <-- prints execution time
for tau in taus:
true_params['tau'] = tau
df1, df2 = monte_carlo_sim(true_params, num_sim_cycles)
mc_results1.append(df1)
mc_results2.append(df2)
In [24]:
for tau, df in zip(taus, mc_results1):
true_params['tau'] = tau
fig, ax = plt.subplots(1, 3, figsize=(16, 4))
for i, var in enumerate(labels):
std = df[var].std()
df[var].hist(bins=30, ax=ax[i])
ax[i].set_title("%s = %.1f (%.3f)" % (var, true_params[var], std), fontsize=18)
ax[i].axvline(true_params[var], color='r', ls='--')
#print('True parameters: %s' % true_params)
In [25]:
for tau, df in zip(taus, mc_results2):
true_params['tau'] = tau
fig, ax = plt.subplots(1, 3, figsize=(16, 4))
for i, var in enumerate(labels):
std = df[var].std()
df[var].hist(bins=30, ax=ax[i])
ax[i].set_title("%s = %.1f (%.3f)" % (var, true_params[var], std), fontsize=18)
ax[i].axvline(true_params[var], color='r', ls='--')
#print('True parameters: %s' % true_params)
The last two multipanel figures compare the fitting accuracy
of the model parameter for the simple-exponential and integrated-exponential models.
We note that, in particular for the tau
parameter,
the integrated exponential model is significantly more accurate,
providing good estimates at much smaller integration times.
This comparison demonstrates empirically the strong advantage in using the theoretically more correct integrated exponential model.